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Abstract 

We have devised a simple numerical technique to treat rugged data points that arise due to the 
"insufficient gain setting error" (or quantization error) of a digital instrument. This is a very 
wide spread problem that all experimentalists encounter some time or the other and they are 
forced to deal with it by suitable adjustments of instrument gains and other relevant 
parameters. But mostly this entails one to repeat the experiment - this may be inconvenient at 
the least. Here we prescribe a method that would actually attempt to smoothen the data set 
that is already so obtained. Our method is based on an entirely different algorithm that is not 
available anywhere else. This method mimics what one would do by intuitive visual 
inspection and not like the arcane digital filtering, spline fitting etc. that is available in the 
market. Nor does it depend on any instrumental parameter tweaking. This makes the program 
totally general purpose and also intellectually more satisfying. 
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Introduction 

In a real experiment there is always a problem with noise in the data set. Depending on the 
situation there are different techniques to remedy that as far as possible [1,2,3]. However, the 
advent of digital instrument has brought along with it a new type of error that was not there in 
the analog world. Nowadays most of the instruments in a research lab are essentially 
digitized. Apart from counters, timers and the like that are intrinsically digital, analog signals 
like current, voltage etc. are also digitized to help processing in the digital world. However, 
that brings with it a tradeoff too. Instead of the infinite variability of analog signals, we are 
now limited by digitization errors wherein the signal is represented by a collection of bits. 
Removing this error needs a different type of noise filtering program. The error (or noise) is 
accentuated in case of incorrect (or low) gain setting of the instrument, leading to the so 
called "insufficient gain error" or quantization error. In this case the instrument does not 
register small variations in the input till change is too large. During this time the output is 
held constant. Then the output changes to the next level and settles there till the input level 
change again shifts it up or down. This therefore introduces a systematic error in the 
measurement and should be rectified. 

There are various quantization errors as applicable to different systems and situations. 
Their remedies will also be as different. Here we report on a particular case only. However, 
we believe that the algorithm so developed is more general in nature and hence applicable in 
a wider sense. 

We take a typical case of a Lock-in Amplifier (Stanford SR830) taking data to study a 
phase transition. It is expected that during, before or after, the phase transition, there will be a 
huge signal change. But the amount of change expected (or even the direction of change in 
some cases) may not be known at the start. So one generally keeps a generous amount of 
lower sensitivity to counter any subsequent overloads during the run. This then generates the 
quantization errors at far away points from the transition temperatures. Now, this can be 
generally taken care of by alert programming which will switch the ranges or auto ranging 
the instrument, however, there may be times when this may not be desirable or intentionally 
not done. For one, an extra amount of logic has to be put in and tested/debugged. The more 
serious troubles are i) range switching takes a bit of time (dead time) to take affect (generally 
done by reed switches) and settle its output - data will not be acquired during this time; ii) 
there is always a range switching error, no matter how small. That is there is a small error in 
the output for the same input level at different range settings. Admittedly this is now more 
imperceptible in modern Lock-in Amplifiers than before. There is another such associated 
error, however. In case the output from "X" or "Y" or "Display Output" output socket is fed 
to a filter for further processing down the chain, this range switching must be accomplished 
by suitable resetting of timer capacitors (integrator or differentiator) else there will be a huge 
problem of over loading. This happens because this output always maintains full scale 
sensitivity to the full output, (for example, both IV and lmV FSD will give 10V output at 
this port). 

So in case one is forced to stay with a single gain throughout the experiment and there 
is a considerable change in the input level one gets staircases in the output. This is well 
known and called insufficient gain setting error or insufficient gain error. This limits the 
resolution of the system. Our aim in this paper is to devise a suitable algorithm to overcome 
it. This is best illustrated with a representative graph - shown in Figure 1. We can clearly see 
the steps at lower temperatures, much below the transition temperature. Here we would like 
to correct for them. 

The central idea of our scheme can be illustrated by focusing on a small section of the 
data. In Figure 2, we have taken out a small section of the previous graph and blown it up. 

It is obvious that the fine changes in the graph could not be grasped by the digital 
instrument and so it hung on to the previous value as long as possible till the input had 
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changed to more than or equal to one bit (least significant bit) of the digitized system. It is not 
a genuine property of the input signal here, but only an artifact. 

To illustrate our method clearly, let us take a hypothetical dataset of similar nature, 
but comprising of only three steps - as in Figure 3. The data points are shown in solid lines, 
whereas the dotted line represents the underlying curve. Let us focus on the middle section 
alone. We think that if we can somehow rotate the (ruler-like) straight line point set about the 
moment point, it will at least very closely follow the actual graph. This is the crux of the idea 
and the rest of the algorithm that follows is actually on the details of how we go about 
implementing the idea effectively in the case of a real data set. 



Details of the method 

Let us start with a stair that is not at the last or at the first of the whole set. The extremum set 
will be dealt with separately. So let us take the same step as before, from Figure 3. 

We see that around the moment point, the data points must be rotated, either up or 
down (depending on the side of the moment point, to which the data points belong). We are 
not considering the signs of Y and X axes here presently. Now, by rotation, we simply mean 
that we will change Y values of a point to Y', keeping the X constant. This is quite different 
from the conventional idea of rotation where (X, Y) pair is transformed to (X', Y') with the 
help of cosine and sine functions as in the general rotation in coordinate space. 

Now the first question is which is the moment point around which this rotation will 
take place. In the absence of any apriori knowledge of the data set, we have to take the 
midpoint of the step. To choose the guiding line we now find the midpoint of the next step. 
Since there are only two points, we can exactly fit one straight line between them. We assume 
this to be the target line. In reality this may be wrong, but we are adapting here due to two 
reasons: i) if the step jump size is small (like here) as compared to the total range of Y, a 
curve can be very well approximated by a piecewise linear approximation, and ii) the 
programming logic is the simplest and the approach is less controversial. 

So then, once we know the straight line, i.e., the slope and the intercept, we can 
transform Y to Y' for the same X, through these slope and intercept. This is done for the data 
points from the first set, that are towards the side of the next step, starting from the midpoint 
to the end on this side. For the next step, the points that are now considered are those on the 
left side of its midpoint. We thus hope that we have been able to replace the jagged edges 
with a continuous curve. This process is repeated till last step, save the extreme side of the 
entire data set. In case of the beginning of the entire data set, we similarly leave out the 
extreme half end of the step. Alternatively, the exclusion of the extremum partial steps can be 
made a subjective operation. In one version of the algorithm we made some minor changes to 
include the extreme points the smoothened out curve. Since we did not have a point after 
(before) the last (first) step to use as a second point for the linear fit, we extrapolated the line 
generated from the last (first) two steps respectively. However, this point needs serious 
observation for each set of data. 

The next major job is to figure out the length of the step. While this seems trivial, we 
found that for real life data sets it is a daunting task. The naive idea is to start from the 
beginning of the data set and then scan it onwards till the last read values differ from the 
previous values in Y. Initially we started with some known (by visual inspection) values. 
Frequently they are about l/10 th to Vi of the value of the step height itself. This works for 
noise free simulated data sets no doubt (for which even a step size of zero works). However it 
is a different story altogether when it is a real experiment data set. We found that: i) many 
times the end points are fudged due to +V2 bit errors and ii) there may be noisy spikes 
(anywhere in the step) which are obviously as big as the gap itself, and this spike may even 
be three or four points (x axis values) long, although frequently they are only one x axis value 
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long. Also there may be more than one noise spikes in the same step. There are important 
ramifications for these two effects. We discuss them later. 

For the first effect (points with half bit error at the beginning or end of a data set), 
obviously this smears the true end points of a step. This will make the program either latch on 
to the head or the tail of a set and this will go on for subsequent steps. The result is 
catastrophic. 

This occurs because the naive program could not properly decide to keep the erring 
data point in the previous or in the next set. We have decided to deal with it in the following 
way. We introduced a step parameter which will set a cut off (window) for the points and if a 
value is not within the window, it is considered to belong to the next step. This way it is 
accommodated in either of the steps. 

The second case is even more challenging. It is here that we could not provide a 
simple prescription. While single value excursions can be treated as noise and safely ignored 
(while trying to find out the length of the step) it is a multi step noise that creates the most 
confusion. While this is not free from objection, we decided to treat up to four error points as 
noise and otherwise treat the points to form a genuine step. However, we believe that human 
intervention may be needed to judiciously choose the step versus noise. 

While working on, we found that the step gap size may be variable also. That is, 
where the steps are more common and longer in size is where the graph is tapering off. But 
near a curvature, while the step length may be small, there may exhibit smaller gaps too. This 
will make the program get into a never ending loop if the prearranged step height parameter 
is maintained. To effectively counter this behavior, we think that the step size should be 
made self adjustable. However, this is not implemented in the current program. 

The final frontier is now round the corner. It is really where the curvature undergoes a 
peak or trough (like if the data set has only one step at the peak instead of a set of points or 
steps) we found that our program gives a sharp peak which is probably artificial. However, 
this can be overcome by using a second order polynomial to fit the mid points of the steps 
instead of the linear fit that we have used. However, this means that we should consider three 
successive steps and especially the book keeping becomes a bit more complicated. 

However, with these shortcomings included, we found the method gives a rather 
delightful smooth curve from the jagged curves that were thrown at it. We wrote the core 
program in FORTRAN 77 that is still now widely supported in most environments. The 
algorithm is given in appendix 1 . The computational demands are minimal, much faster than 
any of the conventional digital filtering routines that we know of. The routine does not invoke 
any special functions or arcane syntax, so it is possible to convert it into another high level 
language. If it is necessary to incorporate any graphing routine, or incorporate it into other 
packages (like an outline data acquisition and analysis program), it is eminently straight 
forward to do it. Finally the source code of the program can be obtained from the authors. 

We felt that we should take the program through a rigorous validation routine. We 
started off lightly by using a simulated data set with a few steps simulating a monotonously 
increasing function. Then we moved on to a simple curve with a peak having about 30 data 
points. Both the tests ran very smoothly. Then we created a data set of 500 points having the 
nature of a bell curve. The program gave a very valid result. Now to check whether the 
program was actually giving back a data set of the true nature of the original curve which 
would have been had it not been digitized, we generated smooth curves of linear, exponential 
and polynomial functions and digitized them using another algorithm that we wrote. These 
simulated data sets looked very much like real digitized graphs except for the fact that they 
were totally noise free and did not have half bit error. Then we used these digitized data to 
test our graph smoothing algorithm and superimposed the generated smooth curve on the 
original smooth curve. We got a perfect fit every time. Now we ventured into some real data 
sets which would invariably have some noise and half bit errors. Here we had to really put the 
step parameter to use here. The outcome, although not perfect, was quite satisfactory. We can 
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now validly claim to have smoothed out digitized curves. 

We now show the effect of this routine on a number of real life data: i) simulated and 
noise free but with steps (equation. Y=l-exp(ax) (Figure 4), ii) susceptibility of ferrite 
nanoparticles (Figures 1 and 5) and hi) resistance of a high temperature superconductor. 

Discussion and conclusion 

The major short corning of this present routine is that it actually accentuates the curve around 
a maximum or minimum point if the extremum consists of only one or two steps. As 
discussed previously, we think if we assume a three point parabolic fitting (considering three 
steps) the situation will definitely improve. However, this is more complicated than before, 
more for the book keeping involved. We are working on it. We are trying to incorporate the 
linear regime in general, except in cases where the curve passes through any extremum. 
However, in a noisy data set, this is no trivial task. 

In short, we claim to have made a simple idea into a useful routine for data smoothing. 
The method not only intutive, it is very fast (hardly any computational overhead) - so it is 
eminently suitable for incorporating into a real time data acquisition system. The simplistic 
nature of the program also makes it portable across all languages in any platform. 
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Appendix 1 
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x k = (xi+x i+m .i)/2 
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Interpolate xn< X; < Xk and yk_ 
i< y; < yk onto the straight line 
joining (x k ,y k ) and (x k -i,y k -i). 
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points between x k & 

x k -i 
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Note: This is the basic algorithm without any frills. To incorporate the step parameter all we 
have to do is replace the equalities and inequalities with "less than" and "greater than" 
operations. The calculation of the extremum values comes out of this itself because it is a 
simple matter of index manipulation rather that algorithm manipulation. (Xi,yO are the data 
points and (x k ,y k ) are the midpoints of the step. 
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Figure captions 

Figure 1. A representative data set (susceptibility of a ferrite nano particle system). 
Figure 2. Exploded view of the cutout section from Figure 1 . 
Figure 3. A hypothetical stair-patterned data set 

Figure 4 Validation of data, (a) Input data with steps, (b) Output smoothened data 

Figure 5. Smoothened data set of Figure 1. 

Figure 6. Resistance of a superconductor BSCCO-2223 
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Figure 1 . Ayan Paul et al. 



10 




11 



Figure 3. Ayan Paul et al. 
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Figure 5. Ayan Paul et al. 
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Figure 6. Ayan Paul et al. 
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